Using the Bootstrap in a Two-Stage Nested Complex Sample Design

نویسنده

  • Steven Kaufman
چکیده

1.0 Introduction Replication variance estimation for a two-stage nested sample design is usually implemented by generating replicate samples (weights) that replicate the original first-stage sample selection. Since the second-stage is nested, the second-stage variance can be reflected by associating each second-stage unit with its respective first-stage unit in each firststage replicate sample. The second-stage sampling can be viewed as being indirectly incorporated into the replicates, because the second-stage sampling is not independently replicated within each replicate. As long as the first-stage sampling rates are not too high or first-stage sampling is done with replacement, this should provide a reasonable variance approximation. This paper investigates whether generating replicates that directly reflect both the first and second stage sampling provide any advantages over replicates that directly reflect only the first-stage sampling. This paper is particularly interested in the National Center for Education Statistics’ (NCES) School and Staffing survey (SASS). In this survey, the first-stage sampling rates can be large. SASS collects data for both the first and second stage units. Because of the large sampling rates, a firststage finite population correction (FPC) is required for the first-stage data variance estimates. Since estimation frequently requires combining the first and second stage data, the replicate weights for estimates based on first-stage units must be consistent with the replicate weights for estimates based on second-stage units. This implies using the first-stage FPC in both variance estimators. However, when the second-stage variance is indirectly reflected, applying a first-stage FPC can underestimate the second-stage variance component (i.e., a first-stage FPC bias), since the second-stage component is correct without this adjustment. By directly reflecting the second-stage variance in the replicates, this bias can be eliminated. This paper will present two sets of replicate weights. Both sets will incorporate the same firststage FPC. One set will indirectly reflect the second-stage variance, while the other set will directly reflect the second-stage variance. Results will be presented using high and low first-stage sampling rates, which will provide a measure of the first-stage FPC bias, for different sized FPCs. To generate a set of replicate weights that directly reflect the sampling at both selection stages, a bootstrap methodology will be used. In that methodology, a first-stage bootstrap sample is selected of size h n ∗ . Within each selected first-stage bootstrap unit i , a second-stage bootstrap sample is selected of size i m ∗ . h n ∗ and i m ∗ are chosen to provide unbiased first and second order moments. Sitter (1992) and Kaufman (2000) provide examples how this can be done. To generate replicate weights that indirectly reflect the second-stage variance, a bootstrap method will also be used. The bootstrap method is the first-stage component of the bootstrap estimator described above (i.e., a first-stage bootstrap sample is selected of size h n ∗ , where h n ∗ is chosen to provide an unbiased variance estimator for the first-stage sample). The second-stage replicate weight is the product of the first-stage replicate weight, just described, times the conditional second-stage weight given the first-stage unit is in sample. To compare the variance estimators, a simulation study is performed. Within stratum, the first-stage SASS is selected systematically probability proportional size (PPS). Within each selected firststage unit, the second-stage sample is selected systematically with equal probability. Kaufman (2001) provides an appropriate systematic PPS FPC under a locally random assumption. By imposing the locally random assumption in the simulation, the appropriate first-stage FPC is known for both variance estimators. The main source of bias is then determined through how this FPC is used in the two sets of replicate weights. Performance is measured by comparing relative errors and coverage rates. To start, the bootstrap procedures are described. 2.0 Bootstrap Distribution Function In this discussion, the bootstrap is defined in terms of the sampling process rather than in terms of a specific variable of interest (i.e., the object is to Joint Statistical Meetings Section on Survey Research Methods

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of classifiers applied to confocal scanning laser ophthalmoscopy data.

OBJECTIVES Comparison of classification methods using data of one clinical study. The tuning of hyperparameters is assessed as part of the methods by nested-loop cross-validation. METHODS We assess the ability of 18 statistical and machine learning classifiers to detect glaucoma. The training data set is one case-control study consisting of confocal scanning laser ophthalmoscopy measurement v...

متن کامل

Non-Nested Models and the Likelihood Ratio Statistic: A Comparison of Simulation and Bootstrap Based Tests

We consider an alternative use of simulation in the context of using the Likelihood-Ratio statistic to test non-nested models. To date simulation has been used to estimate the Kullback-Leibler measure of closeness between two densities, which in turn ’mean adjusts’ the Likelihood-Ratio statistic. Given that this adjustment is still based upon asymptotic arguments, an alternative procedure is to...

متن کامل

Optimum Block Size in Separate Block Bootstrap to Estimate the Variance of Sample Mean for Lattice Data

The statistical analysis of spatial data is usually done under Gaussian assumption for the underlying random field model. When this assumption is not satisfied, block bootstrap methods can be used to analyze spatial data. One of the crucial problems in this setting is specifying the block sizes. In this paper, we present asymptotic optimal block size for separate block bootstrap to estimate the...

متن کامل

Using Sample Weights in Item Response Data Analysis Under Complex Sample Designs

Large-scale assessments are often conducted using complex sampling designs that include the stratification of a target population and multi-stage cluster sampling. To address the nested structure of item response data under complex sample designs, a number of previous studies proposed the multilevel/multidimensional item response models. However, incorporating sample weights into the item respo...

متن کامل

Design of Economic Optimal Double Sampling Design with Zero Acceptance Numbers

  In zero acceptance number sampling plans, the sample items of an incoming lot are inspected one by one. The proposed method in this research follows these rules: if the number of nonconforming items in the first sample is equal to zero, the lot is accepted but if the number of nonconforming items is equal to one, then second sample is taken and the policy of zero acceptance number would be ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002